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Abstract 

In this paper the sequential prediction problem with expert ad- 
vice is considered for the case where losses of experts suffered at each 
step cannot be bounded in advance. We present some modification of 
Kalai and Vempala algorithm of following the perturbed leader where 
weights depend on past losses of the experts. New notions of a vol- 
ume and a scaled fluctuation of a game are introduced. We present 
a probabilistic algorithm protected from unrestrictedly large one-step 
losses. This algorithm has the optimal performance in the case when 
the scaled fluctuations of one-step losses of experts of the pool tend to 
zero. 

Keywords: prediction with expert advice, follow the perturbed 
leader, unbounded losses, adaptive learning rate, expected bounds, 
Hannan consistency, online sequential prediction 

1 Introduction 

Experts algorithms are used for online prediction or repeated decision making 
or repeated game playing. Starting with the Weighted Majority Algorithm 

*This paper is an extended version of the ALT 2009 conference paper [19] . 
^This research was partially supported by Russian foundation for fundamental research: 
09-07-00180-a and 09-01-00709a. 
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(WM) of Littlestone and Warmuth jTT] and Vovk's [T7] Aggregating Algo- 
rithm, the theory of Prediction with Expert Advice has rapidly developed in 
the recent times. Also, most authors have concentrated on predicting binary 
sequences and have used specific (usually convex) loss functions, like absolute 
loss, square and logarithmic loss. A survey can be found in the book of Lu- 
gosi, Cesa-Bianchi [12j. Arbitrary losses are less common, and, as a rule, they 
are supposed to be bounded in advance (see well known Hedge Algorithm of 
Freund and Shapire [6j, Normal Hedge [2j and other algorithms). 

In this paper, we consider a different general approach - "Follow 
the Perturbed Leader - FPL" algorithm, now called Hannan's algo- 
rithm [TJ, [TO], [12] • Under this approach we only choose the decision that 
has fared the best in the past - the leader. In order to cope with adversary 
some randomization is implemented by adding a perturbation to the total 
loss prior to selecting the leader. The goal of the learner's algorithm is to 
perform almost as well as the best expert in hindsight in the long run. The 
resulting FPL algorithm has the same performance guarantees as WM-type 
algorithms for fixed learning rate and bounded one-step losses, save for a 
factor y/2. 

Prediction with Expert Advice considered in this paper proceeds as fol- 
lows. We are asked to perform sequential actions at times t = 1, 2, . . . , T. At 
each time step t, experts i = 1, . . . N receive results of their actions in form 
of their losses s\ - arbitrary real numbers. 

At the beginning of the step t Learner, observing cumulating losses 
s* 1 . t _ 1 = s\ + . . . + s t t _ l of all experts i — 1, . . . N, makes a decision to follow 
one of these experts, say Expert i. At the end of step t Learner receives 
the same loss s\ as Expert i at step t and suffers Learner's cumulative loss 

Sl:t — Sl:t-1 + S t . 

In the traditional framework, we suppose that one-step losses of all ex- 
perts are bounded, for example, < s\ < 1 for all i and t. 

Well known simple example of a game with two experts shows that 
Learner can perform much worse than each expert: let the current losses 
of two experts on steps t = 0, 1, . . . , 6 be Sq x 2 3 4 5 6 = (^, 0, 1, 0, 1, 0, 1) and 
s o.i 23456 = (0) 1) 0) 1) 0) 1; 0)- Evidently, the "Follow Leader" algorithm al- 
ways chooses the wrong prediction. 

When the experts one-step losses are bounded, this problem has been 
solved using randomization of the experts cumulative losses. The method 
of following the perturbed leader was discovered by Hannan [7]. Kalai and 
Vempala pjj] rediscovered this method and published a simple proof of the 
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main result of Hannan. They called an algorithm of this type FPL (Following 
the Perturbed Leader). 

The FPL algorithm outputs prediction of an expert i which minimizes 

J _ _t« 

°lrt-l ? ' 

where % = 1, . . . N, t = 1,2,..., is a sequence of i.i.d random variables 
distributed according to the exponential distribution with the density p(x) = 
exp{— x}, and e is a learning rate. 

Kalai and Vempala [10] show that the expected cumulative loss of the 
FPL algorithm has the upper bound 

log: N 

E(s 1:t ) < (1 + e) min s * ;t + -5— , 

1=1, ...,N 6 

where e is a positive real number such that < e < 1 is a learning rate, N is 
the number of experts. 

Hutter and Poland [S], [2] presented a further developments of the FPL al- 
gorithm for countable class of experts, arbitrary weights and adaptive learn- 
ing rate. Also, FPL algorithm is usually considered for bounded one-step 
losses: < s\ < 1 for all i and t. Using a variable learning rate, an optimal 
upper bound was obtained in [9] : 

E(s l . t )< min s\, + 2V2TlniV. 

i=l,...,N l - 1 

Most papers on prediction with expert advice either consider bounded losses 
or assume the existence of a specific loss function (see [12])- We allow losses 
at any step to be unbounded. The notion of a specific loss function is not 
used. 

The setting allowing unbounded one-step losses do not have wide coverage 
in literature; we can only refer reader to [T], [3], |14j . 

Poland and Hutter [T3] have studied the games where one-step losses of 
all experts at each step t are bounded from above by an increasing sequence 
B t given in advance. They presented a learning algorithm which is asymp- 
totically consistent for B t = t 1 ^ 16 . 

Allenberg et al. [1] have considered polynomially bounded one-step losses 
for a modified version of the Littlestone and Warmuth algorithm [11] under 
partial monitoring. In full information case, their algorithm has the expected 
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regret 2VN\nN(T+l) 1 2( 1+a+ P) in the case where one-step losses of all experts 
i = 1, 2, ... N at each step t have the bound (s l t ) 2 < t a , where a > 0, and 
f3 > is a parameter of the algorithm. They have proved that this algorithm 
is Hannan consistent if 

1 T 

max - V (s!) 2 < cT a 
i<i<N T ^ y tJ 
t=i 

for all T, where c > and < a < 1. 

In this paper, we consider also the case where the loss grows "faster than 
polynomial, but slower than exponential". A motivating example, where 
losses of the experts cannot be bounded in advance, is given in Section |4j 

We present some modification of Kalai and Vempala p33] algorithm of 
following the perturbed leader (FPL) for the case of unrestrictedly large 
one-step expert losses si not bounded in advance: s\ G (— oo,+oo). This 
algorithm uses adaptive weights depending on past cumulative losses of the 
experts. 

The full information case is considered in this paper. We analyze the 

asymptotic consistency of our algorithms using nonstandard scaling. We 

t 

introduce new notions of the volume of a game v t — Vp + J~] maxj |s* | and the 

j=i 

scaled fluctuation of the game fluc(t) = Av t /v t , where Av t — v t — v t _i and 
Vq is a nonnegative constant. 

We show in Theorem [T] that the algorithm of following the perturbed 
leader with adaptive weights constructed in Section [3] is asymptotically con- 
sistent in the mean in the case where v t — > oo and Avt = o(vt) as t — > oo 
with a computable bound. Specifically, if fluc(t) < / ~f(t) for all t, where ^(t) 
is a computable function such that 7(t) = o(l) as t — > oo, our algorithm has 
the expected regret 

T 

2 v /(6 + 6)(l + lnA05>(t)) 1 / 2 A^, 

t=i 

where e > is a parameter of the algorithm. 

In case where all losses are nonnegative: s\ G [0, +oo), we obtain a regret 

T 

2 v /(2 + e )(l + lniV)^( 7 (t)) 1 /2A^. 
t=i 
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In particular, this algorithm is asymptotically consistent (in the mean) 
in a modified sense 

limsup — E(si-t — niin s\. T ) < 0, (1) 

T^oo V T i=l,...N 

where s\ : t is the total loss of our algorithm on steps 1, 2, ... T, and E(si-t) 
is its expectation. 

Proposition [I] of Section [2] shows that if the condition Avt = o{vt) is 
violated the cumulative loss of any probabilistic prediction algorithm can be 
much more than the loss of the best expert of the pool. 

In Section [3] we present some sufficient conditions under which our learn- 
ing algorithm is Hannan consistent. Q 

In particular case, Corollary [T] of Theorem [T] says that our algorithm is 
asymptotically consistent (in the modified sense) in the case when one-step 
losses of all experts at each step t are bounded by t a , where a is a positive real 
number. We prove this result under an extra assumption that the volume 
of the game grows slowly, lim inf v t /t a+s > 0, where S > is arbitrary. 

t— >oo 

Corollary [I] shows that our algorithm is also Hannan consistent when 8 > |. 

At the end of Section [3] we consider some applications of our algorithm 
for the case of standard time-scaling. 

In Section [4] we consider an application of our algorithm for constructing 
an arbitrage strategy in some game of buying and selling shares of some 
stock on financial market. We analyze this game in the decision theoretic 
online learning (DTOL) framework [6J. We introduce Learner that computes 
weighted average of different strategies with unbounded gains and losses. To 
change from the follow leader framework to DTOL we derandomize our FPL 
algorithm. 

2 Games of prediction with expert advice with 
unbounded one-step losses 

We consider a game of prediction with expert advice with arbitrary un- 
bounded one-step losses. At each step t of the game, all N experts receive 
one-step losses s\ 6 (— oo, +oo), i = 1, . . . N, and the cumulative loss of the 

1 This means that dlT) holds with probability 1, where E is omitted. 
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ith expert after step t is equal to 



l:f 



l:t— 1 



A probabilistic learning algorithm of choosing an expert outputs at any step 
t the probabilities P{h = i} of following the ith expert given the cumulative 
losses s\. t _ l of the experts i — 1, . . . N in hindsight. 

Probabilistic algorithm of choosing an expert. 
FOR t = 1,...T 

Given past cumulative losses of the experts s\. t _ v i — 1, . . . N, choose an 
expert % with probability P{h = «}■ 

Receive the one-step losses at step t of the expert s\ and suffer one-step 
loss St = s\ of the master algorithm. 
ENDFOR 

The performance of this probabilistic algorithm is measured in its expected 
regret 



where the random variable S\-t is the cumulative loss of the master algorithm, 
s\. T , i — 1, . . . N, are the cumulative losses of the experts algorithms and E 
is the mathematical expectation (with respect to the probability distribution 
generated by probabilities P{h = i}, i = 1, • • • N, on the first T steps of the 
game) . 

In the case of bounded one-step expert losses, s\ G [0,1], and a con- 
vex loss function, the well-known learning algorithms have expected regret 
O(^TlogN) (see Lugosi, Cesa-Bianchi [T2]). 

A probabilistic algorithm is called asymptotically consistent in the mean 



E{s\;T — min s\. T ), 



if 




1 



(2) 



A probabilistic learning algorithm is called Hannan consistent if 





almost surely, where s 1:r is its random cumulative loss. 
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In this section we study the asymptotical consistency of probabilistic 
learning algorithms in the case of unbounded one-step losses. 

Notice that when < s\ < 1 all expert algorithms have total loss < T 
on first T steps. This is not true for the unbounded case, and there are no 
reasons to divide the expected regret ^ by T. We change the standard time 
scaling ^ and Q on a new scaling based on a new notion of volume of 
a game. We modify the definition Q of the normalized expected regret as 
follows. Define the volume of a game at step t 

t 

v t = v Q + y^max |s*-|, 

3=1 

where Vq is a nonnegative constant. Evidently, v t ~i < v t for all t. 

A probabilistic learning algorithm is called asymptotically consistent in 
the mean (in the modified sense) in a game with iV experts if 

limsup — E(svt — min s\. T ) < 0. (4) 

A probabilistic algorithm is called Hannan consistent (in the modified sense) 
if 

limsup — I sir — min s,. T ] <0 (5) 

T^oo v T V ' i=1 >" N J 

almost surely. 

Notice that the notions of asymptotic consistency in the mean and Han- 
nan consistency may be non-equivalent for unbounded one-step losses. 
A game is called non-degenerate if Vt — > oo as t — > oo. 
Denote Av t = v t — v t -\. The number 

fluc(t) = ^ = maX ^' , (6) 

v t v t 

is called scaled fluctuation of the game at the step t. 
By definition < fluc(t) < 1 for all t (put 0/0 = 0). 

The following simple proposition shows that each probabilistic learning 
algorithm is not asymptotically optimal in some game such that fluc(t) 
as t — > oo. For simplicity, we consider the case of two experts and nonnegative 
losses. 
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Proposition 1 For any probabilistic algorithm of choosing an expert and for 
any e such that < e < 1 two experts exist such that Vt — >■ oo as t — > oo and 



fluc(t) > 1 - e, 
-E( Sl .. t -mms\, t )>hl-e) 

Vt 1=1,2 Z 



for all t. 



Proof. Given a probabilistic algorithm of choosing an expert and e such that 
< e < 1, define recursively one-step losses sj and si of expert 1 and expert 
2 at any step t — 1,2,... as follows. By s\. t and s\. t denote the cumulative 
losses of these experts incurred at steps < t, let v t be the corresponding 
volume, where t = 1, 2, . . .. 

Define Vq — 1 and M t = 4t> t _i/e for all t > 1. For £ > 1, define s] = 
and = M 4 if = 1} > |, and define = M t and = otherwise. 

Let s t be one-step loss of the master algorithm and Si- t be its cumulative 
loss at step t > 1. We have 

£(*l:t) > ^(«t) = s\P{I t = 1} + s t 2 P{/ t = 2} > 
for all t > 1. Also, since v t = v t -i+M t = (l+4/e)t> t _i and min s\. t < v t -\, the 

i 

normalized expected regret of the master algorithm is bounded from below 

1, , , 2/e-l 1, 

-E( Sl . t - mins ., > ^ > - 1 - e . 

vt i ~ 1 + 4/e ~ 2 V ; 

for all t. By definition 

fluc(t) = = - 1 - > 1 - e 

for all t. A 

Proposition [T] shows that we should impose some restrictions of asymp- 
totic behavior of fluc(t) to prove the asymptotic consistency of a probabilistic 
algorithm. 



8 



3 The Follow Perturbed Leader algorithm with 
adaptive weights 

In this section we construct the FPL algorithm with adaptive weights pro- 
tected from unbounded one-step losses. 

Let j(t) be a computable non-increasing real function such that < 
j(t) < 1 for all t and j(t) — > as t — > oo; for example, ■yit) = l/t s , where 
5 > 0. Let also a be a positive real number. Define 




In 



q(l+lnAQ 



for all t, where e = 2.72 . . . is the base of the natural logarithm. ^ 

Without loss of generality we suppose that 7(t) < min{A, A -1 } for all t, 
where 

A _ 2(e 3 / a -l) 
~ a(l + lniV)' 

We can obtain this choosing an appropriate value of the initial constant vq. 
Then < a t < 1 for all t. 

We consider an FPL algorithm with a variable learning rate 

et = — , (9) 

where jj t is defined by (|sj) and the volume Vt-i depends on experts actions 

on steps < t. By definition v t > v t -i and ji t < jj, t _i for t = 1, 2, Also, by 

definition fx t — > as t — > oo. 

Let . t = 1,2, . . ., be a sequence of i.i.d random variables dis- 
tributed according to the density p(x) = exp{— x}. In what follows we omit 
the lower index t. 

We suppose without loss of generality that s l — v o = for all i and 
e = oo. 

The FPL algorithm is defined as follows: 



2 The choice of the optimal value of at will be explained later. It will be obtained by 
minimization of the corresponding member of the sum ( 42 ) . 
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FPL algorithm PROT. 

FOR t = 1,...T 

Choose an expert with the minimal perturbed cumulated loss on steps 

< t 

I t = axgmin i=12) ^{s^ - }. (10) 
Receive one-step losses s\ for experts i = 1, . . . ,N, define v t = Vt-i + maxsj 

i 

and e t +i by @. 

Receive one-step loss St = of the master algorithm. 
ENDFOR 

T 

Let Si : t = ^2 s t* be the cumulative loss of the FPL algorithm on steps 
t=l 

< T. 

The following theorem shows that if the game is non-degenerate and 
Avt = o(vt) as t — > oo with a computable bound then the FPL-algorithm 
with variable learning rate (|9| is asymptotically consistent. 

We suppose that the experts are oblivious, i.e., they do not use in their 
work random actions of the learning algorithm. The inequality (12) of Theo- 
rem [T] below is reformulated and proved for non-oblivious experts at the end 
this section. 

Theorem 1 Let '-/(t) be a computable non-increasing real function such that 
< j(t) < 1 and 

fluc(t) < 7(t) (11) 

for all t. Then for any e > the expected cumulated loss of the FPL algorithm 
PROT with variable learning rate |$|), where parameter a depends on e, is 
bounded: 

T 

E(s 1:T ) < mins\. T + 2^(6 + e)(l + In N) V (~f(t)) 1/2 Av t (12) 

4=1 

for all t. 

In case of nonnegative unbounded losses s\ G [0, +oo) we have a bound 

T 

E{s 1:T ) < minsi. T + 2^(2 + e)(l + In N) V ( 7 (t)) 1/2 A^. (13) 



t=l 
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Let also, the game be non- degenerate and j(t) — > as t — >■ oo. T/ien t/ie 
algorithm PROT is asymptotically consistent in the mean 

limsup — E(s\-t — min s\. T ) < 0. (14) 

T^oo i=l,.-N 

Proof. The proof of this theorem follows the proof-scheme of [8 J and [10J. 

Let a t be a sequence of real numbers defined by ([7|; recall that < a t < 1 
for all t. 

The analysis of optimality of the FPL algorithm is based on an interme- 
diate predictor IFPL (Infeasible FPL) with the learning rate e' t defined by 
@. 

IFPL algorithm. 

FOR t = 1,...T 

Define the learning rate 

where /i t = a{^{t)) a \ (15) 



foVt' 



v t is the volume of the game at step t and a t is defined by ([7]). 

Choose an expert with the minimal perturbed cumulated loss on steps 

< t 

J t = argmin i=12j ,__ N {s\. t - -f }. 

e t 

Receive the one step loss s/' of the IFPL algorithm. 
ENDFOR 

The IFPL algorithm predicts under the knowledge of s\. t , i = 1,...N 
(and v t ), which may not be available at beginning of step t. Using unknown 
value of e' t is the main distinctive feature of our version of IFPL. 

For any t, we have It = aigmm i {s\. t _ 1 — j~ t C} an d Jt — ^ T S m ^ n i{si- t — 
= argmin^sf.^! + s\- 

The expected one-step and cumulated losses of the FPL and IFPL algo- 
rithms at steps t and T are denoted 

l t = E(si*) and r t = E(s/'), 

T T 

h-.T = y^Jt and r 1:T = ^Jn, 
t=\ t=i 



11 



respectively, where is the one-step loss of the FPL algorithm at step 
t and s/* is the one-step loss of the IFPL algorithm, and E denotes the 
mathematical expectation. 

Lemma 1 The cumulated expected losses of the FPL and IFPL algorithms 
with reaming rates defined by |$|) and (15) satisfy the inequality 

T 

h-.T < ri: T + 2(e 3 / a - 1) ^( 7 (t)) 1 -«*Ai; t (16) 

t=i 

/or a// T, where at is defined by |?|). 

Proof Let ci, . . . Cjv be nonnegative real numbers and 

■ r i 1 i 

w e t 

m'j = mm{s\. t - -a} = min-Js^ + - -q}. 

Let mj = s&.i - ^Cji and = s& - ^c j2 = + sf - ±c h . By 

definition and since j 2 ^ j we have 

m i = c ji — s l-t-l c j 2 — s l-.t-l + s t 2 C J2 = (I?) 





i 






l:t— 1 — 












(i- 





»w - ^ c ia + Q - C J 2 = m'j + (Jj ~ Cj2- (18) 

We compare conditional probabilities P{h = j\C = c%,i ^ j} and P{ J t = 
j\C = c u i^j}. 

The following chain of equalities and inequalities is valid: 

P{i t = j\€ = c l ,i^j} = 

p{4-.t-i--e <mj\e = Ci ,i^j} = 

P{e>e t (s{ :t _ 1 -m 3 )\e = c l ,tytj} = 

P{e > e' t (s{:t-i - mj) + (e t - e' t )(s{-.t-i - m,)\C = ± j) < (19) 

P{e>e' t (4:t-x-m j ) + 

~ 4)(4 t -i - + -9 2 )|f = c,,i ^j} = (20) 
12 



exp{-(e t - e t )(s{. t ^ - s{ 2 t ^)} x 
P{? > 4(4*-i " mj) + (e t - e' t )-c n \C = «,i ? j} < 



p{e><(*L-* 



exp{-(e t -e / t )(sj !t _ 1 -.s- 
1 



m 3 



C h) + 



exp 



(e t -e' t )-c h \C = a,i y^j} 
exp{-(e t - <4)(4t-i - 4 t _!) + e' t s J t } x 

p{e>e;(4 t -m;.)ie = Ci ,^j} 

1 1 



HtVt-i Htv t 



V S l:t— 1 S l:t-l) + 



X 



(21) 
(22) 

(23) 

(24) 
(25) 

(26) 



exp 



M Ol:t-l - S 



J2 > 

\:t-\) 



Vt-1 



x 



(27) 



p{e > — (4 :t 



m'M = Ci,i ? j} 



exp 



fJ-tVt 



'l:t-l 



<J2 

'l:i-l 



p{j t = iir = Ci,i^j}. 



(28) 



Here the inequality (19)-(20) follows from (17) and e t > e[. We have used 



twice, in change from fl20| ) to (21) and in change from (24) to (25), the 
equality P{£ > a + b} = e" h P{^ > a} for any random variable £ distributed 



according to the exponential law. The equality (22)-(23) follows from (18). 



We have used in change from (26) to (27) the equality vt — Vt-\ = At> t and 
the inequality \s{ \ < Avt for all j and t. 



The ratio in the exponent (|28|) is bounded : 

< 2, 



— S 



l:t-l 



(29) 



since 



< 1 for all t and z. 
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Therefore, we obtain 

p{i t = j\e = c l ,i^j}< 

exp\-—)p{Jt = j\? = Ci,i*j}< 
exp{(3/a)( 7 (t)) 1 - Qt }P{J t = j\e = c h i^j}. (30) 



Since, the inequality (30) holds for all q, it also holds unconditionally 

P{h = j} < exp{(3/a)( 7 (t)) 1 - ai }P{ J t = j}. (31) 

for all t = 1, 2, . . . and j = 1, . . . N. 

Since s 3 t + Av t > for all j and t, we obtain from (31 ) 

N 

l t + Av t = E{s{* + Av t ) = J2(4 + &v t )P(I t = j) < 

3=1 

N 

exp{(3/a)( 7 (t)) 1 - Qt } + A^)P(J, = j) = 

3=1 

exp{(3/a)( 7 (t)) 1 - at }(P( S / t ) + At*) = 
exp{(3/a)( 7 (t)) 1 - a *}(r t + A^) < 
(l + (e 3 /«-l))( 7 (t)) 1 -«*)(r t + At; t ) = 
r t + Au t + (e 3 / a - l)(7(*)) 1_Q *(r t + Av t ) < 

r t + Av t + 2(e 3/a - l)( 7 (*)) 1 - a *Ai; t . (32) 



In the last line of (32) we have used the inequality |r 4 | < Avt for all i and 
the inequality exp{3r} < 1 + (e 3 — l)r for all < r < 1. 



Subtracting Au t from both sides of the inequality (32) and summing it 
by t = 1, . . . T, we obtain 



h-.T < ri :T + 2(e 3 / a - 1) ^^(t)) 1 -"^ 



t=l 



for all T. Lemma [T] is proved. A 

The following lemma, which is an analogue of the result from |10j . gives 
a bound for the IFPL algorithm. 
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Lemma 2 The expected cumulative loss of the IFPL algorithm with the 
learning rate (15) is bounded : 

T 

r 1:T < min s\. T + ail + In N) V(7(t)) at Av t (33) 



t=l 



for all T , where at is defined by |?|). 

Proof. The proof is along the line of the proof from Hutter and Poland |8] 
with an exception that now the sequence e' t is not monotonic. 

Let in this proof, s t = (sj, . . . ) be a vector of one-step losses and 
s i:t = ( s i i! • • • s i;t) t>e a vector of cumulative losses of the experts algorithms. 
Also, let £ = (£ 1 , . . . £ N ) be a vector whose coordinates are random variables. 

Recall that e' t = l/(fi t v t ), fit < A*t-i for all t, and Vq — 0, e' Q — oo. 

Define Si t = Si t — -r£ for t = 1, 2, Consider the vector of one-step 

€ t 

losses §t = St — £ (jt — 7^~^j f° r the moment. 
For any vector s and a unit vector d denote 

M(s) = argmin deD {d ■ s}, 

where D — {(0, . . . 1), . . . , (1, . . . 0)} is the set of N unit vectors of dimension 
N and "•" is the inner product of two vectors. 
We first show that 

T 

M ( § i-t) ■ §t < M(s 1:T ) ■ s£ T - (34) 

t=i 

For T = 1 this is obvious. For the induction step from T — 1 to T we need 
to show that 

M(§i : t) • §t < M(§i : t) ' §i ; t — M(Si : T-l) • Si : T-l- 

This follows from §i : t = Si : t-i + §t and 

Af(Si :T ) • Sl; T -l > M(Si :T _i) ■ Siyr-l- 



We rewrite (34) as follows 

T 



M ( § ^) • § t ^ M ( § 1:T) • Sl;T + ^ M(g 1:t ) • £ [- - — 
t=l t=l 



(35) 
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By definition of M we have 



Af (Si :T ) ' Sl :T < M (Si :T ) • Si :T 



min{d • s 1;T } — M(s 1:T ) • — . (36) 



deD e T 



The expectation of the last term in (36) is equal to — = lltVt- 
The second term of (35) can be rewritten 



t=l ^ 

T 

J2(Vtv t - Ht-iv t -i)M(s 1:t ) ■ £ . (37) 



We will use the inequality for mathematical expectation E 

< E(M(s 1:t ) ■ < £?(M(0 ■ = E(maxC) < 1 + ln ^- ( 38 ) 

i 

The proof of this inequality uses ideas of Lemma 1 from |S] . 

We have for the exponentially distributed random variables £*, i — 
1,...N, 

N 

Plmaxf > a} = P{3i(f > a)} < VP{f > a} = iVexp{-a}. (39) 



Since for any non-negative random variable rj, E(rj) = J P{f] > y}dy, by 

o 

(39) we have 

oo 

E(maxC - InJV) = / P{max«f — IniV > y}dy < 
i J i 

o 

oo 

J Nexp{-y - In N}dy = I. 



Therefore, P(maxj£*) < 1 + IniV. 
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By (38) the expectation of (37) has the upper bound 



t=i 



t=i 



Here we have used the inequality fit < ^t-i for all t 
to 



Since E(£ l ) = 1 for all i, the expectation of the last term in (36) is equal 



E M(s 1:T ) 



■T 



1 



(40) 



Combining the bounds (35)-(37) and (40), we obtain 

T 



mm 



r 1:T = E M ( § i:t) • St J < 

T 

in s^. T — /i-rfr + (1 + In iV) ^t^ v t < 

T 

min s\. T + (1 + In N) V mAv t . 



(41) 



Lemma is proved. A. 

We finish now the proof of the theorem. 

The inequality (16) of Lemma [I] and the inequality (33) of Lemma [2] imply 
the inequality 

E(si : t) < mins^.j. + 

i 

T 

+ ^(2(e 3/a - l)( 7 (t)) 1 - a ' + a(l + lniV)(7(t)) Qt )At; t . (42) 



t=i 



for all T. 

The optimal value ([7]) of a 4 can be easily obtained by minimization of 
each member of the sum ( |42| by at- In this case /j, t is equal to (j8j) and (42) 
is equivalent to 



#(sut) < minsi :T + 2A/2a(e 3 / a - 1)(1 + In A^) ^( 7 (t)) 1/2 At» t , (43) 



t=i 
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where a is a parameter of the algorithm PROT. 

Also, for each e > an a exists such that 2a(e 3 ^ a — 1) < 6 + e. Therefore, 
we obtain (12). 

We have Ylt=i ^ v t = v t for ah T, v t — > oo and j(t) — >■ as t — > oo. Then 
by Toeplitz lemma (see Lemma [5] of Section [A| 



as T — >■ oo. Therefore, the FPL algorithm PROT is asymptotically consistent 
in the mean, i.e., the relation (14) of Theorem [l] is proved. A 

In case where all losses are nonnegative: s\ G [0, +oo), the inequality (29) 
can be replaced on 



'l:t— 1 b l:t-l 



V t -1 



< 1 



for all t and i. In this case an analysis of the proof of Lemma [T] shows that 
the bound (43) can be replaced on 



E(s 1:T ) < mms\. T + 2Ja(e 2 / a - 1)(1 + In N) ^( 7 (t)) 1/2 Aw t 



t=i 



where a is a parameter of the algorithm PROT. 

Since for each e > an a exists such that a(e 2 / a — 1) < 2 + e, we obtain 
a version of (12) for nonnegative losses - the inequality (13). 

We study now the Hannan consistency of our algorithm. 

Theorem 2 Assume that all conditions of Theorem^ hold and 

oo 



< OO. 



(44) 



Then the algorithm PROT is Hannan consistent: 



1 



T->oc Vt 



limsup — I S\:T — min s\. T < 



i=l,...N 



(45) 



almost surely. 
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Proof. So far we assumed that perturbations £ x , • • • ,£ are sampled only 
once at time t — 0. This choice was favorable for the analysis. As it easily 
seen, under expectation this is equivalent to generating new perturbations 
£t, . .. at each time step t; also, we assume that all these perturbations 
are i.i.d for i = 1, . . . ,N and t = 1,2,.... Lemmas [TJ [2] and Theorem [T] 
remain valid for this case. This method of perturbation is needed to prove 
the Hannan consistency of the algorithm PROT. 

We use some version of the strong law of large numbers to prove the 
Hannan consistency of the algorithm PROT. 

Proposition 2 Let g(x) be a positive nondecreasing real function such that 
x/g(x), g(x)/x 2 are non-increasing for x > and g(x) = g(—x) for all x. 
Let the assumptions of Theorem [I] hold and 



Then the FPL algorithm PROT is Hannan consistent, i.e., holds as T — > 
oo almost surely. 

Proof. The proof is based on the following lemma. 

Lemma 3 Let a t be a nondecreasing sequence of real numbers such that at — > 
oo as t — > oo and X t be a sequence of independent random variables such 
that E(X t ) = 0, for t = 1,2,.... Let also, g(x) satisfies assumptions of 
Proposition^ Then the inequality 




(46) 




(47) 



implies 




(48) 



as T — > oo almost surely. 



The proof of this lemma is given in Section [Aj 



19 



Put X t = (s t - E(s t ))/2, where s t is the loss of the FPL algorithm PROT 
at step t, and at = Vt for all t. By definition \X t \ < Av t for all t. Then (47) 
is valid, and by (48) 

T 

— (Sl :T " £(Sl:r)) = ~ V> t - £(s t )) -)• 



as T — ?• oo almost surely. This limit and the limit (14) imply (45). A 



By Lemma|2jthe algorithm PROT is Hannan consistent, since (44) implies 
(46) for g(x) = x 2 . Theorem [2] is proved. A 

Authors of jl] and [H] considered polynomially bounded one-step losses. 
We consider a specific example of the bound (42) for polynomial case. 

Corollary 1 Assume that \s\\ < t a for all t and i = 1, . . . N, and v t > t a+6 
for all t, where a and 5 are positive real numbers. Let also, in the algorithm 
PROT, 7$ = t~ s and m = a(j(t)) a * , where at is defined by Then 

• (i) the algorithm PROT is asymptotically consistent in the mean for 
any a > and 5 > 0; 



(ii) this algorithm is Hannan consistent for any a > and 5 > 
(Hi) the expected loss of this algorithm is bounded : 

E(s 1:T ) < mms[. T + 2^(6 + e)(l + \nN)T 1 -^ 5+a 

i 

as T — )■ 00, where e > is a parameter of the algorithm Jj 



1 . 

2-' 



(49) 



This corollary follows directly from Theorem [Tj where condition dSJ of The- 



orem [I] holds for 5 > | . 

If 5 = 1 the regret from (49) is asymptotically equivalent to the regret 
from Allenberg et al. [I] (see Section [I]). 

For a = we have the case of bounded loss function (\s l t \ < 1 for all i 
and t). The FPL algorithm PROT is asymptotically consistent in the mean 
if Vf > f3(t) for all t, where /3(t) is an arbitrary positive unbounded non- 
decreasing computable function (we can get j(t) = l/(3(t) in this case). This 
algorithm is Hannan consistent if (44) holds, i.e. 



5>(*)) 



-2 



< 00. 



3 Recall that given e we tune the parameter a of the algorithm PROT. 
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For example, this condition be satisfied for (3{t) = t 1/,2 ln£. 

Theorem [T] is also valid for the standard time scaling, i.e., when vt = T 
for all T, and when losses of experts are bounded, i.e., a = 0. Then for any 
e > the expected regret has the upper bound 

T 

2v / (6 + e)(l + lnA0^( 7 (£)) 1 / 2 < V(6 + e)(l + In N)T 
t=i 

which is similar to bounds from [8J and [10J. 

Let us show that the bound ( 12 ) of Theorem [T] that holds against oblivious 



experts also holds against non-oblivious (adaptive) ones. 

In non-oblivious case, it is natural to generate at each time step t of the 
algorithm PROT a new vector of perturbations £t = (£*>••• > ^t V ) ; £o is empty 
set. Also, it is assumed that all these perturbations are i.i.d according to the 
exponential distribution P, where i = 1,...,N and t = 1,2,.... Denote 

£l:t = • • • ) 

Non-oblivious experts can react at each time step t on past decisions 
Si, S2, ■ ■ ■ St-i of the FPL algorithm and on values of £i, . . . , £t-i- 

Therefore, losses of experts and regret depend now from random pertur- 
bations: 

4 = 4(^-1), i = h-..,N, 

Av t = Av t (£ 1:t -i), 

where t = 1,2 



In non-oblivious case, condition (11) is a random event. We assume in 



Theorem [T] that in the game of prediction with expert advice regulated by 
the FPL-protocol the event 

fluc(t) < j(t) for all t 

holds almost surely. 

An analysis of the proof of Theorem [T] shows that in non-oblivious case, 



the bound (12) is an inequality for the random variable 

T 

E(s t ) — min s\. T — 

t=i 

T 

-2^(6 + e)(l + IniV) ^( 7 (t)) 1 / 2 A^ < 0, (50) 



t=i 
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which holds almost surely with respect to the product distribution P* -1 , 
where the loss of the FPL algorithm St depend on a random perturbation £ t 
at step t and on losses of all experts on steps < t. Also, E is the expectation 
with respect to P. 

Taking expectation Ei-t-x with respect to the product distribution P 1 ^ 1 



we obtain a version of (12) for non-oblivious case 



E l:T s l:T - mms[. T - 2^(6 + e)(l + IniV) ^( 7 (t)) 1/2 A^ < 



for all T. 



4 An example: zero-sum experts 

In this section we present an example of a game, where losses of experts 
cannot be bounded |20j in advance. Let S = S(t) be a function representing 
evolution of a stock price. Two experts will represent two concurrent methods 
of buying and selling shares of this stock. 

Let M and T be positive integer numbers and let the time interval [0, T] 
be divided on a large number M of subintervals. Define a discrete time series 
of stock prices 

S = S(0), Sx = S(T/{M)), S 2 = S(2T/(M)) ...,S M = S(T). (51) 
In this paper, volatility is an informal notion. We say that the difference 

T-l 

(St — Sq) 2 represents the macro volatility and the sum (AS 1 *) 2 ) where 

i=0 

ASi = Si + x — Si, % = 1, . . . T — 1, represents the micro volatility of the time 



series (51 ). 

The game between an investor and the market looks as follows: the in- 
vestor can use the long and short selling. At beginning of time step t Investor 
purchases the number C% of shares of the stock by St-i each. At the end of 
trading period the market discloses the price St+x of the stock, and the in- 
vestor incur his current income or loss s t = C t AS t at the period t. We have 
the following equality 

T-l 

2 _ 



(S T -S ) 2 = (J2 A S. 



1 1 

t=o 



22 



262 L 




Fig. 1. Evolution of a stock price 



T-l 



r-i 



^2(5 t -5o)AS t + ^(AS t ) 2 



t=o 



t=0 



(52) 



The equality (52) leads to the two strategies for investor which are repre- 



sented by two experts. At the beginning of step t Experts 1 and 2 hold the 
number of shares 



(53) 
(54) 



Cl = 2C(S t -S ), 

c? = -cl 

where C is an arbitrary positive constant. 

These strategies at step t earn the incomes s\ = 2C(St — So)ASt and 



sj. The strategy (53) earns in first T steps of the game the income 



T 



T-l 



n-.T 



J2sl = 2C((S T ~S ) 2 -J2(^S t ) 2 ). 



t=i 



t=i 
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Fig. 2. Fluctuation of the game 



The strategy (54) earns in first T steps the income s\. T = — s\. T . 

The number of shares C\ in the strategy (53) or number of shares C 2 = 
—C\ in the strategy (54) can be positive or negative. The one-step gains 
s\ and s\ = —s] are unbounded and can be positive or negative: s\ G 
(— oo, +oo). 

Informally speaking, the first strategy will show a large return if (St — 

T-l 

So) 2 ^ (ASi) 2 ; the second one will show a large return when (St— S ) 2 

i=0 

T-l 

^2 (ASi) 2 . There is an uncertainty domain for these strategies, i.e., the case 

i=0 

when both ^ and <C do not hold. The idea of these strategies is based on the 
paper of Cheredito j3] (see also Rogers [15], Delbaen and Schachermayer 
who have constructed arbitrage strategies for a financial market that con- 
sists of money market account and a stock whose price follows a fractional 
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Fig. 3. Two symmetric solid lines - gains of two zero sums strategies, dotted line 
- expected gain of the algorithm PROT, dashed line - volume of the game 



Brownian motion with drift or an exponential fractional Brownian motion 
with drift. Vovk |18j has reformulated these strategies for discrete time. We 
use these strategies to define a mixed strategy which incur gain when macro 
and micro volatilities of time series differ. There is no uncertainty domain 
for continuous time. 

We analyze this game in the decision theoretic online learning (DTOL) 
framework [6J . We introduce Learner that can choose between two strategies 



(53) and (|54j). To change from the follow leader framework to DTOL we 
derandomize the FPL algorithm PROTj^We interpret the expected one-step 
gain E(s t ) gain as the weighted average of one-step gains of experts strategies. 
In more detail, at each step t, Learner divide his investment in proportion 



to the probabilities of expert strategies (53) and (54) computed by the FPL 



*To apply Theorem [T] we interpreted gain as a negative loss. 
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algorithm and suffers the gain 



G t = 2C(S t - S ){P{I t = 1} - P{h = 2})AS t 

at any step t, where C is an arbitrary positive constant; Gut = Ylt=i Gt = 
E(si-t) is the Learner's cumulative gain. 

Assume that |s*| = o(^* =1 \sj\) as f -> oo. Let ^(t) = /i for all t, where 
\x is arbitrary small positive number. Then for any e > 



G\-.T > 



t=i 



2//V(6 + e)(l + lniV) 



for all sufficiently large T, and for some Vq > 0. 

Under condition of Theorem [T] we show that strategy of algorithm PROT 
is "defensive" in some weak sense : 



Gut — 



E*. 1 



> -o 



t=i 



as T — > oo. 



5 Conclusion 

In this paper we try to extend methods of the theory of prediction with 
expert advice for the case when experts one-step gains cannot be bounded 
in advance. The traditional measures of performance do not work in general 
unbounded case. To measure the asymptotic performance of our algorithm, 
we replace the traditional time-scale on a volume-scale. New notion of volume 
of a game and scaled fluctuation of a game are introduced in this paper. 
In case of two zero-sum experts this notion corresponds to the sum of all 
transactions between experts. 

Using the notion of the scaled fluctuation of a game, we can define very 
broad classes of games (experts) for which our algorithm PROT is asymp- 
totically consistent in the modified sense. Also, restrictions on such games 
are formulated in relative terms: the logarithmic derivative of the volume of 
the game must be o(t) as t — > oo. 

A motivating example of a game with two zero-sum experts from Section [4] 
shows some practical significance of these problem. The FPL algorithm with 
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variable learning rates is simple to implement and it is bringing satisfactory 
experimental results when prices follow fractional Brownian motion. 

There are some open problems for further research. It would be useful to 
analyze the performance of the well known algorithms from DTOL framework 
(like "Hedge" [B] or "Normal Hedge" [2]) for the case of unbounded losses in 
terms of the volume of a game. 

There is a gap between Proposition [T] and Theorem [TJ since we assume 
in this theorem that the game satisfies fluc(t) < j(t) — > 0, where ^(t) is 
computable. Also, the function 7(t) is a parameter of our algorithm PROT. 
Does there exists an asymptotically consistent learning algorithm in case 
where fluc(t) — > as t — > oo and where the function ^(t) is not a parameter 
of this algorithm? 

A partial solution is based on applying "double trick" method to an in- 
creasing sequence of nonnegative functions 7«(t) such that 7i(t) — > as t — > oo 
and 7i(£) < 7i+i(t) for all i and t. In this case a modified algorithm PROT 
is asymptotically consistent in the mean in any game such that 

fluc(i) 
hm sup — — — < oo 

for some i. 

We consider in this paper only the full information case. An analysis of 
these problems under partial monitoring is a subject for a further research. 



A Proof of Lemma 3 



The proof of Lemma [3] is based on Kolmogorov's theorem on three series and 
its corollaries. For completeness of presentation we reconstruct the proof 
from Petrov [13] (Chapter IX, Section 2). 

For any random variable X and a positive number c denote 



X c 



X if |X| < c 
otherwise. 



The Kolmogorov's theorem on three series says: 

For any sequence of independent random variables X t , t = 1, 2, . . ., the 
following implications hold 
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If the series YltLi -Xt is convergent almost surely then the series 
YltLi-EX£, Y^tL\DXt and Yl^=i P{\Xt\ > c} are convergent for each 
c > 0, where E is the mathematical expectation and D is the variation. 

The series YltLi Xt is convergent almost surely if all these series are 
convergent for some c > 0. 



See Shiryaev [IB] for the proof. 

Assume conditions of Lemma [3] hold. We will prove that 



EgjXt 
i 9(<k) 



< oo 



(55) 



implies 



— < oo 
t=i 1 



almost surely. From this, by Kroneker's lemma [5] (see below), the series 

1 oo 

-J> (56) 



t=i 



is convergent almost surely. 

Let Vt be a distribution function of the random variable X t . Since g 
non-increases, 

P{\X t \ > a t \ < I —^dVtix) < 



\x\>a t 9{ a tj 



9Vh) 



Then by (55) 



t=i 



> n < oo 



(57) 



almost surely. Denote 



Z t 



X t if \X t \ < a t 
otherwise. 
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By definition x 2 /g(x)) < a t /g(a t ) for \x\ < a t . Rearranging, we obtain 
x 2 /a t < g(x)/g(at) for these x. Therefore, 



EZf 



a 2 f , , „ , , „ . a 2 



x 2 dV t (x) < / g(x)dV t (x) < -^-Eg{X t ). 

9\<h) J 9i a t) 

\x\<a t \x\<at 



By (55) we obtain 



Since EX t = j xdV t (x) = 0, 



(58) 



\EZ t 



xdVt(x) 



x\>a t 



<^r / g(x)dV t (x) < -°^Eg(X t ) 
g\a t ) J g(at) 

\x\>a t 



By (55) 



t=i v 



1 oo 



< 



£ 

t=i 



E 



< oo. 



From (57)-(59) and the theorem on three series we obtain (56). 



(59) 



We have used Toeplitz and Kroneker's lemmas. 

Lemma 4 (Toeplitz) Let x t be a sequence of real numbers and b t be a se- 

t 

quence of nonnegative real numbers such that a t = ^ ~~ 00 > x t ~ ► x an d 

i=i 

\x\ < oo. Then 



1 * 

a t ~ 

i=i 



X,: — > x. 



(60) 



Proof. For any e > an t e exists such that \xt — x\ < e for all t > t £ . Then 

1 v 1 

— ) bi(xi - x) < — } \bi{xi - x)\ + e 

at a**—? 



for all t >t € . Since a t — > oo, we obtain (60). 



Lemma 5 (Kroneker) Assume x t < oo anda t — >■ oo Then — ^2 aiXi — > 



t=i 



it f 



The proof is the straightforward corollary of Toeplitz lemma. 
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